DWS-AQA: A Cost Effective Approach for Very Large Data Warehouses

نویسندگان

  • Jorge Bernardino
  • Pedro Furtado
  • Henrique Madeira
چکیده

Data warehousing applications typically involve massive amounts of data that push database management technology to the limit. A scalable architecture is crucial, not only to handle very large amount of data but also to assure interactive response time to the users. Large data warehouses require a very expensive setup, typically based on high-end servers or high-performance clusters. In this paper we propose and evaluate a simple but very effective method to implement a data warehouse using the computers and workstations typically available in large organizations. The proposed approach is called data warehouse striping with approximate query answering (DWS-AQA). The goal is to use the processing and disk capacity normally available in large workstation networks to implement a data warehouse with a very reduced infrastructure cost. As the data warehouse shares computers that are also being used for other purposes, most of the times only a fraction of the computers will be able to execute the partial queries in time. However, as we show in the paper, the approximated answers estimated from partial results have a very small error for most of the plausible scenarios. Moreover, as the data warehouse facts are partitioned in a strict uniform way, it is possible to calculate tight confidence intervals for the approximated answers, providing the user with a measure of the accuracy of the query results. A set of experiments on the TPC-H benchmark database is presented to show the accuracy of DWS-AQA for a large number of

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A middle layer for distributed data warehouses using the DWS-AQA technique

The DWS (Data Warehouse Striping) technique is a round-robin data partitioning approach especially designed for distributed data warehouse environments. In DWS the fact tables are distributed by an arbitrary number of computers and the queries are executed in parallel by all the computers, guarantying a nearly optimal speed up and scale up. This technique is combined with an approximate query a...

متن کامل

Efficient Data Distribution for DWS

The DWS (Data Warehouse Striping) technique is a data partitioning approach especially designed for distributed data warehousing environments. In DWS the fact tables are distributed by an arbitrary number of low-cost computers and the queries are executed in parallel by all the computers, guarantying a nearly optimal speed up and scale up. Data loading in data warehouses is typically a heavy pr...

متن کامل

Scalable Maintenance of Multiple Interrelated Data Warehousing Systems

The maintenance of data warehouses(DWs) is becoming an increasingly important topic due to the growing use, derivation and integration of digital information. Most previous work has dealt with one centralized data warehouse only. In this paper, we now focus on environments with multiple DWs that are possibly derived from other DWs. In such a large-scale environment, data updates from base sourc...

متن کامل

Requirements Engineering for Data Warehouses

Data Warehouses (DWs) aim at supporting the decision-making process of an organization. In the Requirements Engineering (RE) domain, several methods were proposed for the development of DWs, most of them based on the Goal-Oriented Requirements Engineering (GORE) approach. However, there is not yet a comprehensive and unified perspective of the various methods proposed. In this paper, a coherent...

متن کامل

Data Warehouse Striping: Improved Query Response Time

The increasing use of decision support systems led to an explosion in the amount of business information that must be managed by the data warehouses. Therefore, data warehouses must have efficient Online Analytical Processing (OLAP) that provides tools to satisfy the information needs of business managers, helping them to make faster and more effective decisions. Improving query response time i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002